Frequent Items Mining Algorithm Over High Speed Network Flows Based on Double Hash Method
نویسندگان
چکیده
In the high-speed backbone network, with the increasing speed of network link, the number of network flows increase rapidly. Meanwhile, with restrictions on hardware computing and storage resources, so, how to identify and measure large flows timely and accurately in massive data become a hot issue in high speed network flow measurement area. In this paper, we propose a new algorithm based on double hash algorithm to realize large flow frequent items identification, according to the defect of MF algorithm which produces false positive easily and frequent updates to bring the huge pressure to the system. The complexity and false positive rate of the algorithm was analyzed. The effect of large flow frequent items statistical accuracy and discard rate for parameter configuration was analyzed through simulation. The theoretical analysis and the simulation result indicate that compare to MF algorithm, our algorithm can identify large flow frequent items more accurately, and satisfies the need of actual measurement.
منابع مشابه
HASH-MINE: A New Framework for Discovery of Frequent Itemsets
Discovery of frequently occurring subsets of items, called itemsets, is the core of many data mining methods. Most of the previous studies adopt Apriori-like algorithms, which iteratively generate candidate itemsets and check their occurrence frequencies in the database. These approaches suffer from serious costs of repeated passes over the analyzed database. To address this problem, we propose...
متن کاملHASH-MINE: A New Frameword for Discovery of Frequent Itemsets
Discovery of frequently occurring subsets of items, called itemsets, is the core of many data mining methods. Most of the previous studies adopt Apriori-like algorithms, which iteratively generate candidate itemsets and check their occurrence frequencies in the database. These approaches suffer from serious costs of repeated passes over the analyzed database. To address this problem, we propose...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملA New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملAn Efficient Association Rule Mining Using the H-BIT Array Hashing Algorithm
Association Rule Mining (ARM) finds the interesting relationship between presences of various items in a given database. Apriori is the traditional algorithm for learning association rules. However, it is affected by number of database scan and higher generation of candidate itemsets. Each level of candidate itemsets requires separate memory locations. Hash Based Frequent Itemsets Quadratic Pro...
متن کامل